Automatic Extraction of Subcategorization Frames for Corpus-based Dictionary-building

نویسنده

  • Susanne GAHL
چکیده

This paper presents a method for automatically extracting subcorpora isolating different subcategorization frames for nouns, adjectives, and verbs in the 100 mi. word BNC. The tool is being used in the FrameNet project, an NSFfunded project that is involved in producing a database and tools for dictionary-building, based on the principles of Frame Semantics. The subcorpora are used (1) to facilitate the selection of corpus lines illustrating the full range of semantic and syntactic combinatory possibilities of a given lemma, (2) to determine relative frequencies of different syntactic contexts of each lemma in the database. The database thus created, which will be humanand computerreadable, will be a rich resource for lexicographers, as well as for researchers in lexicology and natural language processing. keywords: dictionary-building, corpus linguistics, subcategorization extraction, Frame Semantics

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Extraction of Subcategorization from Corpora

We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accur...

متن کامل

Automatic Acquisition of a Large Subcategorization Dictionary from Corpora

This paper presents a new method for producing a dictionary of subcategorization frames from unlabelled text corpora. It is shown that statistical filtering of the results of a finite state parser running on the output of a stochastic tagger produces high quality results, despite the error rates of the tagger and the parser. Further, it is argued that this method can be used to learn all subcat...

متن کامل

Automatic Extraction of Subcorpora based on Subcategorization Frames from a Part-of-Speech Tagged Corpus

This paper presents a method for extracting sub.cor.pora documenting different subcategorlzatlon frames for verbs, nouns, and adjectives in the 100 mio. word British National Corpus. The extraction tool consists of a set of batch files for use with the Corpus Query Processor (CQP), which is part of the IMS corpus workbench (cf. Christ 1994a,b). A macroprocessor has been developed that allows th...

متن کامل

LexFr: Adapting the LexIt Framework to Build a Corpus-based French Subcategorization Lexicon

This paper introduces LexFr, a corpus-based French lexical resource built by adapting the framework LexIt, originally developed to describe the combinatorial potential of Italian predicates. As in the original framework, the behavior of a group of target predicates is characterized by a series of syntactic (i.e., subcategorization frames) and semantic (i.e., selectional preferences) statistical...

متن کامل

Automatic Extraction of Subcategorization Frames for Bulgarian

Knowledge of verb’s valency or subcategorization is essential for many NLP tasks. The present paper describes an attempt to learn this kind of information from a corpus of parsed sentences of Bulgarian. Our program acquired the subcategorization information for 38 verbs and achieved 87.7% precision and 68.3% recall. We did not use predefined sets of frames but automatically induced such from a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999